lower log-perplexity score
A Tale of Two Structures: Do LLMs Capture the Fractal Complexity of Language?
Alabdulmohsin, Ibrahim, Steiner, Andreas
Language exhibits a fractal structure in its information-theoretic complexity (i.e. bits per token), with self-similarity across scales and long-range dependence (LRD). In this work, we investigate whether large language models (LLMs) can replicate such fractal characteristics and identify conditions-such as temperature setting and prompting method-under which they may fail. Moreover, we find that the fractal parameters observed in natural language are contained within a narrow range, whereas those of LLMs' output vary widely, suggesting that fractal parameters might prove helpful in detecting a non-trivial portion of LLM-generated texts. Notably, these findings, and many others reported in this work, are robust to the choice of the architecture; e.g. Gemini 1.0 Pro, Mistral-7B and Gemma-2B. We also release a dataset comprising of over 240,000 articles generated by various LLMs (both pretrained and instruction-tuned) with different decoding temperatures and prompting methods, along with their corresponding human-generated texts. We hope that this work highlights the complex interplay between fractal properties, prompting, and statistical mimicry in LLMs, offering insights for generating, evaluating and detecting synthetic texts.
- Africa > South Africa > Gauteng > Johannesburg (0.05)
- Europe > United Kingdom > England > Greater London > London > Wimbledon (0.05)
- North America > United States > New York (0.04)
- (20 more...)
- Personal (1.00)
- Research Report > New Finding (0.92)
- Media (1.00)
- Leisure & Entertainment > Sports > Tennis (1.00)
- Leisure & Entertainment > Sports > Baseball (1.00)
- (3 more...)